Phonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures

نویسندگان

Ryan Rifkin

Jake Bouvrie

Ken Schutte

Sharat Chikkerur

Minjoon Kouh

Tony Ezzat

Tomaso Poggio

چکیده

A preliminary set of experiments are described in which a biologically-inspired computer vision system (Serre, Wolf et al. 2005; Serre 2006; Serre, Oliva et al. 2006; Serre, Wolf et al. 2006) designed for visual object recognition was applied to the task of phonetic classification. During learning, the system processed 2-D wideband magnitude spectrograms directly as images, producing a set of 2-D spectrotemporal patch dictionaries at different spectro-temporal positions, orientations, scales, and of varying complexity. During testing, features were computed by comparing the stored patches with patches from novel spectrograms. Classification was performed using a regularized least squares classifier (Rifkin, Yeo et al. 2003; Rifkin, Schutte et al. 2007) trained on the features computed by the system. On a 20-class TIMIT vowel classification task, the model features achieved a best result of 58.74% error, compared to 48.57% error using state-of-the-art MFCC-based features trained using the same classifier. This suggests that hierarchical, feed-forward, spectro-temporal patch-based architectures may be useful for phonetic analysis. 1 This memo is identical to previous CSAIL Technical Report MIT-CSAIL-TR-2007-007 dated January 2007, except for the addition of some new references.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain

This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...

متن کامل

Language identification using spectro-temporal patch features

We present a novel approach for automatic Language Identification (LID) using spectro-temporal patch features. Our approach is based on the premise that speech and spoken phenomena are characterized by typical visible patterns in timefrequency representations of the signal, and that the manner of occurrence of these patterns is language specific. To model this, we derive a randomly selected lib...

متن کامل

Hierarchical Spectro-Temporal Models for Speech Recognition

We seek to explore computational approaches for audition that are inspired by computational visual neuroscience. In particular, we seek to leverage recent progress over the past few years in building a biologically-faithful hierarchical, feed-forward system for visual object recognition [13,14]. The system, which was designed to closely match the currently known feed-forward path in the ventral...

متن کامل

Speaker independent bimodal phonetic recognition experiments

A speaker independent bimodal phonetic classification experiment regarding the Italian plosive consonants is described. The phonetic classification scheme is based on a feed forward recurrent back-propagation neural network working on audio and visual information. The speech signal is processed by an auditory model producing spectral-like parameters, while the visual signal is processed by a sp...

متن کامل

Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling

Unvoiced stops are rapidly varying sounds with acoustic cues to place identity linked to the temporal dynamics. Neurophysiological studies have indicated the importance of joint spectro-temporal processing in the human perception of stops. In this study, two distinct approaches to modeling the spectro-temporal envelope of unvoiced stop phone segments are investigated with a view to obtaining a ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Phonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures

نویسندگان

چکیده

منابع مشابه

Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain

Language identification using spectro-temporal patch features

Hierarchical Spectro-Temporal Models for Speech Recognition

Speaker independent bimodal phonetic recognition experiments

Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling

عنوان ژورنال:

اشتراک گذاری